Soliton driven relaxation dynamics and universality in protein collapse 
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Protein collapse can be viewed as a dynamical phase transition, during which new scales and col- 
lective variables become excited while the old ones recede and fade away. This causes formidable 
computational bottle-necks in approaches that are based on atomic scale scrutiny. Here we consider 
an effective dynamical Landau theory to model the folding process at biologically relevant time and 
distance scales. We reach both a substantial decrease in the execution time and improvement in the 
accuracy of the final configuration, in comparison to more conventional approaches. As an example 
we inspect the collapse of HP35 chicken villin headpiece subdomain, where there are detailed molec- 
ular dynamics simulations to compare with. We start from a structureless, unbend and untwisted 
initial configuration. In less than one second of wall-clock time on a single processor personal com- 
puter we consistently reach the native state with 0.5 Angstrom root mean square distance (RMSD) 
precision. We confirm that our folding pathways are indeed akin those obtained in recent atomic 
level molecular dynamics simulations. We conclude that our approach appears to have the potential 
for a computationally economical method to accurately understand theoretical aspects of protein 
collapse. 
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INTRODUCTION 

Structural classification shows that folded proteins in 
the Protein Data Bank (PDB) [5] are built in a mod- 
ular fashion from a relatively small number of compo- 
nents pp. In SCOP [3] there are presently 1393 unique 
folds while CATH gj has 1282 different topologies. Both 
figures have remained unchanged since year 2008. Fur- 
thermore, according to [5] over 90% of all high resolu- 
tion PDB proteins can be modeled using no more than 
200 explicit soliton motifs as the modular blocks. This 
convergence in protein architecture proposes that in the 
vicinity of the native state the atomic level differences be- 
tween amino acids become less important in determining 
the fold. Instead the protein shape is dominated collec- 
tively, by interactions between a relatively small number 
of modular components that are made of several amino 
acids. 

Classical molecular dynamics (MD) [5]- [9] remains the 
only viable approach to describe truly atomic level pro- 
tein dynamics. With the best available precision MD 
models very short time and distance scale oscillations 
of individual atoms, including both their small ampli- 
tude thermal fluctuations and the detailed interactions 
between all the different atoms. But the process of pro- 
tein folding engages various different temporal and spa- 
tial scales. In particular there are several high energy 
barriers that can be overcome only by relatively slow and 
collective long range oscillations. These hurdles in scales 
and structures are the major technical bottle-necks in 
full atomic level descriptions of protein folding. As a 
consequence a detailed MD simulation of an entire fold- 
ing process remains a formidable task. Even in the case 
of relatively simple and short proteins such as the 35- 



residue subdomain of the villin headpiece (HP35) where 
detailed information on the folding dynamics is now be- 
coming available, a detailed simulation can take several 
months and even years to complete fT0]-|15j. 

In order to enable practical modeling of the folding, 
several different effective approaches have been intro- 
duced |T5]. Examples range from the coarse-grained Go 
model and its variants |16| to carefully crafted energy 
functions such as UNRES [T7] that explicitely aim to av- 
erage over those degrees of freedom that are considered to 
be non-essential for attaining thermodynamically stable 
structures. 

In addition to technical issues, there are also impor- 
tant conceptual matters that need to be addressed in 
selecting the pertinent coarse grained physical degrees of 
freedom. In particular, a folding pathway from a struc- 
tureless straight protein backbone into a biologically ac- 
tive collapsed conformation involves a phase transition. 
During a phase transition a physical system undergoes a 
drastic metamorphosis, old factors loose their relevance 
while new actors enter the stage. In the case of protein 
folding, the highly localized and short time scale atomic 
oscillations become replaced by much slower collective 
motions of extended modular structures. We propose to 
overcome this dual problem of phases and scales in terms 
of an effective dynamical Landau-type theory. In analogy 
of e.g. the effective Landau-Ginzburg theory that models 
a superconductor in terms of collective Cooper pairs and 
vortex lines in lieu of the individual electrons and photons 
of the microscopic BCS theory, we aim to describe the dy- 
namics of protein collapse as a non-conservative Marko- 
vian relaxation process of the relevant modular compo- 
nents. In our approach, a protein consistently folds to its 
PDB structure with a subatomic precision that matches 
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and even exceeds the experimentally determined B-factor 
Debye-Waller fluctuation distances. Since our descrip- 
tion only engages those time and distance scales that 
characterize biologically relevant motions, we can reach 
a sub-second execution time of the entire folding process 
even with an ordinary desktop computer. 



MODEL 



The effective Landau energy is [T5], Q35] 
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Here Kj and Tj are the Frenet bond and torsion angles 
of the C a backbone, the summation extends over all 
backbone angles (kn+i — 0), see [20] for details. Once 
these angles are known the protein backbone can be con- 
structed by solving the discrete Frenet equation [5D] . The 
first sum defines the Hamiltonian of the discrete nonlin- 
ear Schrodinger equation [5T] . In the second sum the first 
two terms are the conserved hclicity and momentum, re- 
spectively. The last term is the Proca mass. The func- 
tional form is firmly anchored in the elegant mathe- 
matical structure of integrable models [3T], it describes 
the protein backbone in terms of universal physical argu- 
ments [18] . The various parameters have constant values 
over each of the soliton motifs i.e. they are characteris- 
tic only to an entire supersecondary structure such as a 
helix- loop-helix |22) . 

In [23] it has been shown that ([I]) supports solitons 
as classical solutions. In [5] it has been shown that over 
90% of all PDB proteins with resolution better than 1.5 A 
can be described in a modular fashion in terms of 200 ex- 
plicitely constructed soliton profiles. The soliton emerges 
as follows: We first eliminate the variable 7V in favor of 
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If the value of Tj falls outside of the fundamental domain 
[— 7r, 7r] we redefine it modulo 2-7T. Using the relation ([2| 
and selecting a = — 1.0 in we then get 

k 4+ i-2k 1 + k 1 _i = U'[Ki]Ki = 2 Hi (i=l,...,N) 

. (3) 

where we define kq — kn+i — 0. This is a generalization 
of the DNLS equation with 
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With different parameter values its (dark) soliton de- 
scribes various protein conformations [5]. But as it 
stands, the soliton models only static PDB configura- 
tions. We now wish to extend this approach into a de- 
scription of the actual dynamical process of protein fold- 
ing. Starting from an unbiased initial configuration we 
aim to reach a soliton arrangement that models the de- 
sired static PDB protein with sub- Angstrom accuracy. 

We shall initiate our simulations with a structureless 
configuration, an unbent and untwisted backbone. This 
configuration resides in the phase where the radius of 
gyration scales with Hausdorff dimension dn = 1. In 
the parameter c characterizes the average strength of 
hydrogen bonds along the backbone. The ensuing contri- 
bution to energy is largely responsible to the formation 
of regular secondary structures and the subsequent tran- 
sition into the collapsed dn ~ 1/3 phase. To conform 
with our initial configuration we start the simulation by 
setting all c = initially. During the early stage of the 
simulation we then introduce the hydrogen bond interac- 
tions by swiftly increasing these parameters to their final 
values, in a uniform manner. At that moment we ob- 
serve the initial formation of regular secondary structures 
such as a-helices and /^-strands. This is quickly followed 
by soliton (loop) formation, either by local soliton pair 
production or by soliton transport throught global defor- 
mations. This effectuates a rapid collapse into a molten 
globule i.e. a configuration in the dn ~ 1/3 phase that 
then proceeds more slowly towards the native state. 

As in any phase transition simulation, we need to avoid 
supercooling that may critically slow down the simula- 
tion. For this we utilize the parameter a in 0. De- 
pending on sign, we interpret it as either a ferromagnetic 
or an antiferromagnetic coupling along a continuous spin 
Ising chain of the Kj. The antiferromagnetic order mod- 
els the folding nuclei that initiate the phase transition, 
these are essentially the soliton centers. We choose the 
uniform ferromagnetic coupling a = —1.0 for all except 
those bonds where we foresee the eventual location of the 
center of a soliton. At the putative soliton locations we 
introduce an initial repulsive antiferromagnetic coupling 
with a = +1.0. At the first stages of the simulation, 
in parallel with the introduction of the hydrogen bond 
interactions, we remove the folding nuclei by decreasing 
the strength of the antiferromagnetic couplings so that 
we arrive at the uniform value a = —1.0 along the entire 
backbone. During the entire folding process all other pa- 
rameters remain intact. With this initial preparation of 
the hydrogen bond interactions and with the transient 
introduction of folding nuclei in our otherwise homoge- 
neous backbone, we avoid supercooling into misfolded 
states with their misplaced or extraneous solitons and 
soliton-soliton pairs. Even if such states are metastable 
with an energy that exceeds the energy of the native 
state and eventually decay, they can have a very long life- 
time and substantially slow down the simulations. We 
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remind that in actual proteins the hydrogen bonds are 
similarly produced during the folding process. There are 
also natural inhomogeneities in the amino acid structures 
that act as folding nuclei. Proline is a good example. 

We propose that during biologically relevant temporal 
scales the conformational changes that drive the collapse 
can be described in terms of an appropriate dynamical 
universality class. For this we thermally average over all 
those very short time scale oscillations and tiny fluctua- 
tions of individual atoms that are irrelevant to the way 
how the folding progresses over a biologically relevant 
time period. The simplest and by far the most natural 
choice is to utilize a Markovian Monte Carlo time evolu- 
tion with the standard, universal heat bath probability 
distribution [H], [25] 

V = Y^~ X Wlth X = eXp{ ~fcT } (5) 

Here AE is the energy difference between consecutive 
MC time steps that we compute from ([!]). We choose 
the numerical value of kT so that we are in the collapsed 
dn ~ 1/3 phase. We have made runs at several differ- 
ent values of kT to confirm that there are no qualitative 
changes in our results, as long as dn ~ 1/3. Since 
is an approximation of the thermodynamical free energy, 
the parameters in ([I]) are a priori temperature depen- 
dent and at the moment we (still) lack a direct relation 
between kT and the physical temperature. 

During the time evolution we suffocate any potential 
rearrangement of covalent bonds along the backbone. For 
this we introduce a self-avoiding condition [19] that en- 
sures that during the folding process the distance be- 
tween any two backbone sites remains at least as large as 
the length of a backbone covalent bond. 

We emphasize that ([5| does not describe the atomic 
level details of the folding process. Such details are highly 
sensitive to the initial configuration including solvent and 
other environmental factors, to the extent that detailed 
knowledge of a particular atomic trajectory during the 
collapse can hardly have any real meaning. Instead the 
evolution determined by ^ describes the universal sta- 
tistical aspects of trajectories over biologically relevant 
scales, how the protein backbone proceeds during the 
dynamical phase transition from a general class of ini- 
tial configurations towards its native state. 

EXAMPLE 

As an example we consider the folding dynamics of 
the 35-residue subdomain of the villin headpiece (HP35). 
The villin is a small ultrafast folding protein that is sub- 
ject to intense studies by experiments, theory and simu- 
lations. The PDB code we use is 1YRF, it describes the 
crystallographic structure at 95K with 1.07 A resolution 
[27j . In Table I we list the relevant parameter values in 



11) together with the corresponding backbone sites. We 



parameter 


soliton-1 


soliton-2 


a 


1.0 


1.0 


Cl 


0.459712 


0.995867 


C2 


4.5533320 


9.408796 


mi 


1.504535 


1.550322 


7712 


1.512836 


1.535081 


K 


- 9.575214e-9 


-1.215692e-08 


d T 


-6.76965e-ll 


-7.840467e-08 


e T 


2.4378718e-8 


2.136684e-08 




6.769649e-10 


4.973244e-12 


RMSD (A) 


0.38 


0.32 



TABLE I: Parameter values for the two-soliton solution of 
JS]/ 1 , |3]) that describes the backbone of 1 YRF with a combined 
0.38A accuracy. The soliton-1 is located between sites 45-57 
(PDB indexing) and the soliton-2 is located between PDB sites 
58-73. Note that the definition of bond angle takes three 

and the definition of torsion angle r^i+i takes four sites. 

have determined the parameters by solving ([3| , ([2| to de- 
scribe the folded structure of 1YRF with an overall 0.38 
A RMSD accuracy. 

We start the folding simulation with an initial config- 
uration that is straight line, with all K^i+i and all t^+i 
equal to zero. This is a configuration with Hausdorff 
dimension dn = 1, and it minimizes (JlJ for c = i.e. 
when there are no hydrogen bond interactions. We turn 
on the hydrogen bonds by increasing c to the values in 
Table I during the first 700.000 steps. We also introduce 
the folding nuclei by selecting the initial values a = +1.0 
for two bonds that are located between sites 53-54 and 
61-62 corresponding to the centers of the two solitons 
in 1YRF. We convert these antiferromagnetic couplings 
into ferromagnetic a = —1.0 in tandem with turning on 
the hydrogen bonds c. After this initial preparation we 
have a random coil configuration. There is a rapid forma- 
tion of regular secondary structures and a collapse into 
a molten globule, followed by a relatively slow progress 
towards the final PDB configuration. 

Since we can reach the final configuration in less than 
one second of total execution time using a single core 
in a MacPro desktop computer, we are able to collect a 
large amount of statistical data to investigate the univer- 
sal aspects of folding pathways. We find that the folding 
proceeds in a very universal manner, the variations be- 
tween different runs are very small and the collapse pro- 
ceeds systematically through steps that are in line with 
the MD simulation in [15] . The collapse starts with the 
formation of the last helix III. The second loop and the 
middle helix II then appear, followed by the formation 
of helix I and the first loop. Towards the end of the col- 
lapse the helix I starts to stabilize at a slightly faster rate 
than the middle helix II. During the final stages the fold- 
ing process consists mainly from an adjustment of the 
middle helix II with the two adjacent loops. This helix 



formation is very similar to the observations made during 
MD simulations as reported in [13] ■ This can be seen by 
comparing our Figure 1 with the corresponding Figure 
in [13]. In Figure 2 we display three generic snap-shot 
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FIG. 1: The formation of helices during the collapse of 1YRF, 
averaged over 4000 trajectories following '131. The helix II is 
formed first, and towards the end of the collapse the helix II 
adjusts itself with the loops. 

configurations together with the final fold. In a typical 
simulation we arrive at a structure that deviates from the 
1YRF in PDB around 0.5 Angstrom in RMSD distance. 
In average, the C a carbons of the PDB configuration has 
B-factors that correspond to a Debye- Waller fluctuation 
distance around 0.4 A. Consequently we can entirely at- 
tribute the average RMSD distance of ~ 0.5A between 
our final configurations and the PDB structure to ther- 
mal fluctuations. 



DISCUSSION 

The protein folding problem endures as the pre- 
eminent unresolved conundrum in science. The major 
problem in any atomic level simulation relates to time 
scales and high energy barriers. These can only be over- 
come with coherent multi-atom collective motions, whose 
molecular dynamics description remains a formidable 
task. Here we have introduced a new paradigm for de- 
scribing and modeling protein collapse. We propose to 
address the large scale hurdles in terms of solitons in an 
effective Landau theory and to describe the ensuing col- 
lapse dynamics universally, by a non-conservative Marko- 
vian heat-bath evolution. We have demonstrated that in 
the case of 1YRF where MD simulations are available 
for comparison, the folding pathways obtained in our ap- 
proach are the same. Among the future challenges is to 
compute the soliton profiles directly from the individual 
amino acids, to facilitate predictive collapse simulations. 
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FIG. 2: A set of snap-shots of a generic folding pathway in 
our simulation, over the shadow of the PDB configuration. 
Color coding shows how the three helices are formed, a) The 
folding starts with formation of helix III. b ) After this, there 
is formation of the other two helices and loops, c) The folding 
proceeds with stabilization of middle helix and loops, d) The 
final fold and the PDB structure of 1 YRF generically coincide 
with RMSD accuracy of about 0.5 A. 
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